Work Environment Survey (WES)
conducted by BC Stats for employees within BC Public Service
measures the health of work environments and identifies areas for improvement
~80 multiple choice questions (5 point scale) and 2 open-ended questions
2020-06-19
Work Environment Survey (WES)
conducted by BC Stats for employees within BC Public Service
measures the health of work environments and identifies areas for improvement
~80 multiple choice questions (5 point scale) and 2 open-ended questions
Question 1
Example: "Better health and social benefits should be provided."
Question 2
Example: "Now we have more efficient vending machines."
*Note: these examples are fake comments for privacy reasons.
What one thing would you like your organization to focus on to improve your work environment?
| Comments* | CPD | CB | EWC | … | CB_Improve_benefits | CB_Increase_salary |
|---|---|---|---|---|---|---|
| Better health and social benefits should be provided | 0 | 1 | 0 | … | 1 | 0 |
Theme: CB = Compensation and Benefits
Sub-theme: CB_Improve_benefits = Improve benefits
Question 1:
+31,000 labelled comments for 2013, 2018, 2020,
+12,000 additional comments from 2015
Question 2:
+6,000 labelled comments for 2018,
+9,000 additional comments from 2015, 2020
*Note: this is a fake comment as an example of the data.
# 1) Build a model to automate multi-label text classification that:
predicts label(s) for Question 1 and 2's main themes
predicts label(s) for Question 1's sub-themes
# 2) Visualizations on discovery of text analysis:
mapping words for both questions to identify common texts
identify potential needs & resolutions using sentimental analysis
identify theme trends across ministries over given years
There are 12 themes and 63 subthemes that comments can be encoded into.
Imbalanced data in each theme
Example comment to get flagged: "George and I love when the deparment gives us new coupons!"
TF-IDF Vectorizer uses weights instead of token counts (CountVectorizer)
Source: Multi-Label Classification: Classifier Chains, by Analytics Vidhya
explored several embeddings on various models
built embedding matrix & maximized vocab coverage for each embedding
transformed comments to padded data to fit into embedding size
removed sensitive data using embeddings to upload into public cloud services for our advanced models
Precision Recall curve: plotting precision vs recall at various threshold rates
Source: Precision and Recall
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| TFID + LinearSVC | 0.50 | 0.79 | 0.63 | 0.70 |
| Fasttext + BiGru | 0.54 | 0.75 | 0.71 | 0.73 |
2019 Capstone team's results
| Model | Accuracy | Precision | Recall |
|---|---|---|---|
| Bag of Words + LinearSVC | 0.45 | 0.74 | 0.64 |
| Fasttext + BiGru | 0.53 | 0.83 | 0.66 |
Source: BC Stats Capstone 2019-Final Report, by A. Quinton, A. Pearson, F. Nie
| Theme | Accuracy | Precision | Recall |
|---|---|---|---|
| CPD | 0.94 | 0.77 | 0.79 |
| CB | 0.97 | 0.90 | 0.90 |
| EWC | 0.94 | 0.69 | 0.56 |
| Exec | 0.92 | 0.64 | 0.71 |
| FEW | 0.97 | 0.73 | 0.77 |
| SP | 0.95 | 0.76 | 0.75 |
| Theme | Accuracy | Precision | Recall |
|---|---|---|---|
| RE | 0.94 | 0.69 | 0.51 |
| Sup | 0.92 | 0.66 | 0.57 |
| SW | 0.92 | 0.74 | 0.65 |
| TEPE | 0.95 | 0.92 | 0.85 |
| VMG | 0.90 | 0.62 | 0.66 |
| OTH | 0.96 | 0.43 | 0.29 |
Subthemes are predicted based on the theme(s) our model has assigned to the comment.
observed better results with more more data
Try BERT (could not get embeddings due to sentitive data not being able to upload to cloud platforms)
using embeddings and padded training & validation data on public cloud services (Google Gollab, AWS) which can pave way for applying more complex machine learning algorithms on sensitive data
Topic modelling for Question 2 can be tried out after removing commonly repeated words